Distance Transformation for Effective Dimension Reduction of High-Dimensional Data

نویسندگان

  • Eniko Szekely
  • Stephane Marchand-Maillet
چکیده

In this paper we address the problem of high-dimensionality for data that lies on complex manifolds. In high-dimensional spaces, distances between the nearest and farthest neighbour tend to become equal. This behaviour hardens data analysis, such as clustering. We show that distance transformation can be used in an effective way to obtain an embedding space of lower-dimensionality than the original space and that increases the quality of data analysis. The new method, called HighDimensional Multimodal Embedding (HDME) is compared with known state-of-the-art methods operating in high-dimensional spaces and shown to be effective both in terms of retrieval and clustering on real world data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distance Preserving Dimension Reduction for Manifold Learning

Manifold learning is an effective methodology for extracting nonlinear structures from high-dimensional data with many applications in image analysis, computer vision, text data analysis and bioinformatics. The focus of this paper is on developing algorithms for reducing the computational complexity of manifold learning algorithms, in particular, we consider the case when the number of features...

متن کامل

Unsupervised Dimension Reduction of High-Dimensional Data for Cluster Preservation

High-dimensional data is receiving increasing attention in more and more application fields, but the analysis of such data has shown to be difficult due to the “curse of dimensionality”. Dimension reduction methods have emerged as successful tools to overcome the problem of high-dimensionality. However, even if they are designed to preserve the most important properties of the data, they are ge...

متن کامل

Dimension Reduction for Linear Separation with Curvilinear Distances

Any high dimensional data in its original raw form may contain obviously classifiable clusters which are difficult to identify given the high-dimension representation. In reducing the dimensions it may be possible to perform a simple classification technique to extract this cluster information whilst retaining the overall topology of the data set. The supervised method presented here takes a hi...

متن کامل

High-Throughput Multi-dimensional Scaling (HiT-MDS): New Variant of MDS

Visualization is a useful tool for data analysis, especially when the data is unknown. However, when the dimension is huge, to produce robust visualization is difficult. Therefore, the dimensional reduction technique is needed. Multi-dimensional Scaling (MDS) is one of the best technique to do dimension reduction, and in this paper one of its variant, that is focused on High-throughput data, ca...

متن کامل

A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters

Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009